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Abstract 

We present an algorithm for approximating semidefinite programs with run- 
ning time that is sublinear in the number of entries in the semidefinite in- 
stance. We also present lower bounds that show our algorithm to have a 
nearly optimal running timeQ. 



1 Introduction 

We consider the following problem known as semidefinite programming 

Find X y (1) 
subject to Ai • X > bi i = l,...,m 

where Vz E [m], Ai E IR nxn is w.l.o.g. symmetric and bi E K. 

Definition 1.1 (e-approximated solution). Given an instance of SDP of the form 
(Q]), a matrix X e W ixn will be called an e-approximated solution if X satisfies: 

1. Ai» X >k- e \/i E [m] 

2. X y-d 

The main result of this paper is stated in the following theorem. 



'This work is a continuation and improvement of the sublinear SDP algorithm in (fl~). 
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Theorem 1.2. There exists an algorithm that given e > and an instance of 
the form (QQ) such thatVi G [m], \\Ai\\p < 1, < 1 and there exists a feasible 
solution X* such that \\X*\\p < 1, returns an e- approximated solution with prob- 
ability at least 1/2. 

The running time of the algorithm is O ( m lo 2 g m + n log 2 



Our upper bound is completed by the following lower bound that states that 
the running time of our algorithm is nearly optimal. 

Theorem 1.3. Given an instance of the form (OQ) such that\/i G [m] ||^4i||F < 1, 
\bi\ < 1, any algorithm that with probability at least 1/2 does the following: either 
finds a matrix X such that X is an e- approximated solution and < 1, or 

declares that no such matrix could be found, has running time at least f2(e~ 2 (m + 

2 Preliminaries 

Denote the following sets: 

B F = {X G R nxn | \\X\\ F < 1} 

in 

A m+1 = {p G R m | V* G [m] Pi > 0,J2p 1 < 1} 

i=i 

§+ = {X G R nxn \XhO, Tr(X) < 1} 
We consider the following concave-convex problem 

m 

max min > Vi(M • X — b{) + Z • X (2) 

xeB F p£A m+u ze§+ ^ 

i=l 

The following claim establishes that in order to approximate £0 it suffices to 
approximate ©. 

Claim 2.1. Given a feasible SDP instance of the form (OQ) let X G M F be such that 

in 

min vAAi • X - hi) + Z • X > -e 

P eA m+1 ,zeS+ ^ 
i=\ 

Then X is an e-approximated solution. 
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Proof. Define Val(X) = min peAm+liZe§+ Yl?=iPi( A i • X - bi) + Z m X. For 
all i G [m] it holds by setting the dual variables to Pi = 1, pj = Vz 7^ j and 
Z = nxn that 

Ai • X - h > Val(X) > -e 

Also, for any vector !)GR™ such that \\v\\ 2 < 1 we set the dual variables to pi = 
Vi and Z = -iw T and thus is holds that 

v T Xv > Val(X) > -e 

which implies that X > —el. □ 



3 The Algorithm 

In this section we present our algorithm that approximates the max-min objec- 
tive in © up to a desired additive factor of e. Our algorithm can be viewed as a 
primal-dual algorithm that works in iterations, on each iteration performing a pri- 
mal improvement step and a dual one. For this task we make use of online convex 
optimization algorithms which are known to be useful for solving concave-convex 
problems. 

Consider the function C : M F x A m+1 x § + — > R given by 

m 

£{X,p, z) = Y,MA *X-b t ) + Z*X 

i=l 

The primal variable X is updated by an online stochastic gradient ascent algorithm 
which updates X by 

Xt+i ^X t + rjV t 

where Vt is an unbiased estimator for the derivative of C(X, p, Z) with respect to 
the variable X, that is E[Vt|p, Z] = YlT=iP^ + ^. the parameter 77 is the step 
size. Note that after such an update the point X t+1 may be outside of the set M F 
and we need to project it back to the feasible set which requires only to normalize 
the frobenius norm. Since we assume that the matrices Ai are symmetric, then the 
primal variable X is also always a symmetric matrix. 

The dual variable p which imposes weights over the constraints is updated by a 
variant of the well known multiplicative weights (MW) algorithm which performs 
the following updates: 

Ft 1 



3 



where w is the vector of weights prior to the normalization to have li norm equals 
1. This update increases the weight of constraints the are not satisfied well by the 
current primal solution X t . 

The MW algorithm produces vectors p t which lie in the simplex, that is Y^Li Pt (*) = 
1. In our case we are interested that the sum of entries in p t may be less then 1. 
We enable this by artificially adding an additional constraint to the sdp instance in 
the form nXn • X > 0. And run the MW algorithm with dimension m + 1. By 
the MW update rule, the size of the entry p m+ i is fixed on all iteration and its en- 
tire purpose is to allow the sum of the first m entries to be less than 1. The added 
constraint is of course always satisfied and thus it does not affect the optimization. 
An additional issue with the MW updates is that it requires to compute on each 
iteration the products A, • X t for all i G [m] which takes linear time in the number 
of entries in the sdp instance. We overcome this issue by only sampling these 
products instead of using exact computation. Given the matrix X we estimate the 
product Ai • X by 

Vi < vf ■ a wltn probability 2 

A {jJ> \\ A \\F 

It holds that E[^|X] = A { • X. 

On the down side the estimates V{ are unbounded which is important to get high 
probability concentration guarantees. We overcome this difficulty by clipping 
these estimates by taking Vi <— max-Jmin-fuj, r]' 1 }, —r]^ 1 }. Note that is no 
longer an unbiased estimator of Ai • X, however the resulting bias is of the order 
of e and thus does not hurt our analysis. Since the values Vi may still be large we 
use the variance of these variables to get better concentration guarantees. It holds 
that 

EK 2 |X]<E^|X] = ||^||^||X||^ 

Finally the dual variable Z, unlike the variables X, p which are updated incremen- 
tally, is always locally-optimized by choosing 

Z <- min MmX 

Here we note that in case X is not PSD then without loss of generality Z is always 
a rank one matrix zz T such that z is an eigenvector of X corresponding to the 
most negative eigenvalue of X. In case X is PSD then Z = nxn . In any case 
H^Hf < 1- Z could be approximated quite fast using an eigenvalue algorithm 
such as the Lanczos method. It will suffice to find a matrix Z such that the product 
Z • X is 0(e) far from the true minimum. 
Finally the algorithm returns the average of all primal iterates. 
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Algorithm 1 SublinearSDP 



Input: e > 0, At G M nxn , h G R for z G [m]. 



Let T 20 2 v / 40e" 2 log m, 77 J^^, e' i- e/4. 
Let Y 1 <- nxn , wx <- l m . 

Let v4 m+ i = nX n, &m+l = 0- 

for t = 1 to T do 

X t <-Y t /max{l,\\Y t \\ F }. 

Pt< m - 



IKIU+i' 

Z t <r- Z G M nxn s.t. ZmX t < min Ze§+ ZmX t + e'. 
i t <-ie [m] w.p. p t (i) and i t <- m + 1 w.p. 1 - ^^^(i). 
Y t+1 ^Y t +^(A it + Z t ) 

Choose G [n] x [n] by (j tj Z t ) <- w.p. X f (j',07ll^i 

for z G [m] do 

A(ii,Z t )||X t || 2 /X t (j t ,Z t )-6i 
Ut(z) <-clip(v t (z), 1/77) 

wt+i(i) «- w t (z)(l - »7«t(i) + if v S) 2 ) 
end for 
end for 

return X = |; J^t 



4 Analysis 

The following lemma gives a bound on the regret of the MW algorithm (line 
15), suitable for the case in which the losses are random variables with bounded 
variance. For a proof see [2] Lemma 2.3. 

Lemma 4.1. The MW algorithm satisfies 

ET l°g m T 2 

Pt Qt < mm > m&x{q t (%), } + + v},PtQt 
ie[m] z — ' 77 77 z — ' 
te[T] J *e[T] ' ' fe[f] 

The following lemma gives concentration bounds on our random variables 
from their expectations. The proof is given in the appendix. 



Lemma 4.2. For 1/4 > 77 > J u ° gm , wif/z probability at least 1 - 0(l/m), if 
(z) max ie[m ] \ J2tem (A • ^ _ ^) ~ ^001 < 3r/T 



Ei 6 [T] (At •X t -b it )- J2te[T\ pJ v t 



< 4:TjT 
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The following Lemma gives a regret bound on the online gradient ascent al- 
gorithm used in our algorithm (line 10). For a proof see ||3l . 



Lemma 4.3. Consider matrices M 1 , ...,M T e M. nxn such that for all i 6 [m 
\\Mi\\F < p- Let X = nxn and fc 
X t+1 = min XeBF \\Y t+1 - X\\ F . Then 



Mi\\ F < p. Let X = nxn and for all t > 1 let Y t+1 = X t + and 



max V M t • X - V M t • X t < 2pv / 2T 
xeB F ^— ' 

te[T] te[T] 

We are now ready to prove our main theorem, theorem [L2l 



Proof. By applying lemma 1431 with parameters M t = A it + Z t and p = 2 we get 

max +Z t ).X-J2 ( A h + ^) • *t < ^^2T 

F te[T] te[T] 

Adding and subtracting J2t=i gi yes 

max V (A lt *X-b it +Z t .X)-Y] (A t • X t - b H + Z t • X t ) < AV2T 

te[T] te[T] 

Since we assume that there exists a feasible solution X* EMp we have that 



J2 ( A n • X t - ^ + Z t • X t ) > -aVzT (3) 

te[T] 

Turning to the MW part of the algorithm, by lemma |4~T1 and using the clipping of 
vAi) we have 



pjv t < min V p T v t + (log m)/r] + rj pjvt 
te[T] te[t] te[T] 



By lemma |4~2~1 (T). with high probability and for any i £ [m], 

te[T] te [t] 
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Thus with high probability it holds that 

m 

^ pjvt < min ^ E Vi ( Ai * X *~ h i) + ( lo S m)/i] + r]^2 PJ v t + 3? 7 T 
te[T] te[r] t=i te[T] 



Applying lemma 14^21 (10 we get that with high probability 



E*=i (A tt .X,-6 4t )< 



/*= 

min peAm+1 X) te[T] J2?=iPi ( A i * x t~ h) + (logm)/r/ + 77 £ te[T] /j> t 2 + 7i]T 
Adding Ylt=i %t • X t to both sides of the inequality and using © yields 



min J] J] ft (Ai • X t - &<) +Z t mX t \> 

peAm+1 tm \i=i 



-4 V"2T - (log m)/rj-T]^2 Pt v t ~ 7 V T ( 4 ) 

te[T] 



It holds that 



T T 



V Z t • X t < Y min (Z • X t + e) < min V (Z • X t + e') 

t=l i=l t=l 

Plugging the last inequality into © gives 



min Y J \Y J Pi(A i *X t -b l ) + Z*X t \> 

t£[T] \t=l 



-4V2T - (log m)/rj-r]Y^ Pl v t ~ 7 V? - e'T (5) 

te[T] 

By a simple Markov inequality argument it holds that w.p. at least 3/4, 

te[n 

Plugging this bound into © and dividing through by T gives with probability at 
least 1 /2 
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Ill 



mm 

peA m+ i,ze§+ 



S £ jPl {A i »X-b l )+Z»X> 



i=i 




— (log m)/(r/T) — 1 177 — e' 



The theorem follows from plugging the values of T, r\ and e'. 



□ 



The algorithm performs 0(e~ 2 log m) iterations. Each iterations includes a primal 
gradient update step which takes 0(n 2 ) time to compute, updating the distribu- 
tion over constrains using a single sample per constraint which takes 0(m) time 
and computing a single eigenvalue up to an 0(e) approximation which using the 
lanczos method takes at most 0{ - l °n- n ) time (see flU theorem 3.2) . Overall the 



running time is as stated in theorem TL2l 

5 Lower Bound 

In this section we prove Theorem II .31 Our proof relies on an information theoretic 
argument as follows: We show that it is possible to generate two random SDP 
instances such that one is feasible and the other one is far from being feasible. We 
show that these two random instances differ only by a single entry chosen also at 
random. Any successful algorithm must distinguish between these two instances 
and thus must read the single distinguishing entry which requires any algorithm 
to read a constant factor of the total number of relevant entries in order to succeed 
with constant probability. 

We split our random construction into the following two lemmas. 

Lemma 5.1. Under the conditions stated in Theorem 17.31 any successful algo- 
rithm must read Q. (^f\ entries from the input. 

Proof. Assume that n > -. Consider the following random instance. With prob- 
ability 1/2 each of the constraint matrices A { has a single randomly chosen entry 



(z, j) £ [^] x [i] that equals J\ — ( 2 — l) and all other entries take ran- 
dom values from the interval [0, £] (the goal of these values is to prevent a sparse 
representation of the input). With the remaining probability of 1/2, all constraint 
matrices except one are exactly as before except for a single constraint matrix 
(chosen at random uniformly) that has all of its entries chosen at random from 
[0, £]• In both cases for each constraint matrix A j, i E [m] it holds that ||A||f < 1. 
In the second case it clearly holds that for all X £ Mp, 
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min A; mX < J At ■ C 2 = — 

In the first case we can construct a solution matrix X* has follows: for each 
G [t^] x = 2e and elsewhere. Clearly X* is positive semi 

definite (since it is a symmetric rank-one matrix) and = 1. For each i G [m] 

it holds that 



A t .X*>2e M /l-C 2 (-^"1 

By choosing ( = e 2 and in both cases bi = 1.6e Vi G [m] we have that in the first 
case 



minA «X*-6i > 2eW 1 - e 4 I At - 1 ) - 1.6e 



Ae 2 

> (VS- 1.6) e > O.le 
In the second case, for all X G M F it holds that, 



min Ai • X — bi < 1.6e = — l.le 

iS[m] 2 

Thus the first instance is feasible while the second one does not admit an e- 
approximated solution and the two instances differ by a single randomly chosen 
entry. □ 

Lemma 5.2. Under the conditions stated in Theorem \1.3\ any successful algo- 
rithm must read ft y^-j entries from the input. 

Proof. The proof follows the lines of the previous proof. Assume that m > j^j, 
e > -7= and that n is even. Let p,q G N n be two random permutations over the 
integers l..n/2 and finally set qi = ^ + qi- Consider the following random instance 
composed of constraint matrices Ai, % G [j^]- With probability 1/2 for each 
Ai we set the entry Ai(pi, qi) to equal a/1 — C 2 ( n2 ~ 1) an d all other entries in 
Ai are sampled uniformly from [0,Q. With the other 1/2 probability, all matrices 
are as before with the difference that we randomly pick a matrix Aj, j G [m] and 
set Aj(pj, qf) to a value sampled uniformly from [0, Q. In both cases it holds that 
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\\A\\f < 1 for all i E [m]. 

In the second case it holds for all X E M F that, 

min Ai • X < n( 

ie[m] 

In the first case we construct a solution X* as follows. For every i E [m] we define 
a matrix X* such that X*(pi, ft) = X*(q h pi) = X?(p h pi) = X*(ft,ft) = 2e and 
X* is zero elsewhere. Finally we take X* = YlT=i Xf- 

Notice that X* is the sum of symmetric rank-one matrices and thus it is positive 
semidefinite. 

Since p, q are both permutations over disjoint sets we have that for every i,j E 
[n] x [n] it holds that \X*(i,j)\ < 2e and thus ||X*||| < ^ • 4 • 4e 2 = 1. 
By construction it holds for every i E [m] that 

Ai • X* > 2e^/l - ( 2 {n 2 - 1) 

By choosing £ = ^- and in both cases bi = 1.6e Vz E [m] we have that in the first 
case 

mm Ai»X*-b t > 2e\ l - in 2 - 1) - 1.6e 

ie[m] V 4n 2 

> 1.6) e > O.le 

In the second case, for all X E M F it holds that, 

min Ai* X - bi < - - 1.6e < -Lie 

iS[m] 2 

Thus as before, the first instance is feasible while the second one does not have 
an e additive approximated solution and the two instances differ by a single entry. 
Notice however that unlike the previous lemma, in this case because of the nature 
of our random construction, after reading k matrices it is suffices for an algorithm 
searching for the distinguishing entry, to only search (| — k) entries in the next 
matrix. Nevertheless, by plugging the values of m and the lower bound on e we 
get that (| — m) 2 > ^ — > ^ and thus any algorithm must still read an 
order of n 2 entries from each matrix. 

□ 
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A Martingale and concentration lemmas 

We first prove a lemma on the expectation of clipped random variables. 

Lemma A.l. Let X be a random variable, let X = clip(X, C) = min{C, max{— C, X}} 
and assume that |E[X]| < C /2 for some C > 0. Then 

\E[X]-E[X]\ < ^Var[X}. 

Proof. As a first step, note that for x > C we have x — E[X] > C/2, so that 

C(x -C)< 2(x - E[X])(x -C)< 2{x - E[X}) 2 . 
Hence, we obtain 

E[X] - E[X] = [ (x + C)dfi x + [ (x - C)dfi x 

Jx<-C Jx>C 

< (x — C)dfj,x 

Jx>C 

<! I (x-E[X]) 2 dfi x 

< |var[X]. 

Similarly one can prove that E[X] -E[X] > -2Var[X]/C, and the result follows. 

□ 

The following lemmas are used to prove lemma l4~2l 

In the following we assume only that v t (i) = clip(v t (i), 1/r/) is the clipping of a 
random variable v t (i), the conditional variance of v t (i) is at most one (Var[v t (i) \ X t ] < 
1) and we use the notation fx t (i) = E[v t (i) \ X t ] = A, • Xj — b { . We also as- 
sume that the expectations of v t (i) are bounded in absolute value by a constant 
\Ht(i)\ = \ Ai*X t -bi\ <C, such that 2 < 2C < l/r). 

Both lemmas are based on an application of Freedman's inequality which is a 
Bernstein-like concentration inequality for martingales which we now state: 

Lemma A.2 (Freedman's inequality). Let £i,...,£t be a martingale difference 
sequence with respect to a certain filtration {S t }, that is E[£ t \ S t ] = Ofor every t. 
Assume also that for every t it holds that \^ t \ <V and E[^ 2 \ St] < s. Then 

'(C«,«) sa -p(-^) 
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Lemma A.3. For > r) > y og ^ m - it holds with probability at least 1 — ^ 
that 



T 

max \S~]v t (i) - p, t (i)\ < 3r]T 

i£\m) 



t=l 



Proof. Given i 6 [m], consider the martingale difference sequence Q = v t (i) — 
E[v t (i)] with respect to the filtration S t = (X t ). 

It holds that for all t, \Q\ < - and E[(Q) 2 \ S t ] < 1. Applying Freedman's in- 
equality we get 



V 2 T 2 /2 \ 
'T+(2/r,)r,T/3j 

< 2exp (-r/ 2 T/4) 

Using lemma lATTI the fact that v t (i) is the clipping of v t {i) and the triangle in- 
equality we have, 



T 

P ( l^>tW - th(i)\ > 3r?T ) < 2exp (-r/ 2 T/4) 
t=i 



Thus for T]> \l 41 °s^ 2m2 ) we have that with probability at least 1 — 

T 



\^v t {i) - n t {i)\ < 3r)T 



t=i 



The lemma follows from taking the union bound over all i G [m] . □ 



Lemma A.4. For ^ > r] > J 41og ( 2m ) # holds with probability at least 1 — — 



2C — I — V T 

that 



te[T] ie[T] 



Proof. This Lemma is proven in essentially the same manner as Lemma lA3l and 
proven below for completeness. 

Consider the martingale difference sequence £ t = pjv t — ¥.\pjv t ] with respect to 
the filtration S t = (X t ,p t ). 

It holds for all t that |£ t | < -. Also by convexity it holds that E[£ t 2 | S t ] = 
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n( P jv t ) 2 1 s t ] < EZiPtmivtW 2 1 < i. 

Applying Freedman's inequality we have, 

p (\Y,tt\>vTj < 2exp(-77 2 T/4) 

Using lemma [ATI the fact that v t (i) is the clipping of v t (i) and the triangle in- 
equality we have, 



P lX>7«t " Pltol >3r)T)< 2exp (-r/ 2 T/4) 



Thus for t] > J ^ 08 ^ ? " > the lemma follows. □ 



Lemma A.5. For > r\ > J 10Clo s( 2m ) ^ w ^ probability a t least 1 — 1/m, 



te[T\ te[T\ 



Proof. Consider the martingale difference £ t = jJL t {it) — Vt ~ IH, where now \x t is a 
constant vector and i t is the random variable, and consider the filtration given by 

S t = {X t ,pt). 

The expectation of n t {it), conditioning on S t with respect to the random choice 
of the index i t , is pj jJL t - Hence E t [£ t | S t ] = 0. 

It holds that |6| < \fit(i)\ + \pJnt\ < 2C. Also E[g] = E[(p: t (i) - pj fi t f] < 
2E[/i 4 (z) 2 ] + 2(p7^) 2 < 4C 2 . 
Applying Freedman's inequality gives, 

< 2exp (-/7 2 T/(10C 2 )) 
where for the last inequality we use C > 1 and 77 < ^. 



Thus for r]> \l 10Clo &( 2m ) m e lemma follows. □ 



Setting C = 2 and 77 = ^ioio^n lemmajAjyieids part ( z -) f lemma and 
combining combining lemmas IA.4I and IA.5I via the triangle inequity yields part 
(ii) of lemma 14 .21 
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